Empirical Knowledge Representation Generation Using N-Gram Clustering

نویسنده

  • Robin Collier
چکیده

System Overview The approach acquires a domain-specific semantic representation by carrying out stochastic analysis of a corpus. Sets of conceptually similar paragraphs are utilised. The corpus and semantic representation are used to generate schematic structures. These are used to concisely store the knowledge contained within existing texts. New texts are processed to dynamically update the knowledge base. Any novel concepts encountered are analysed and a new structure added to the representation. A more comprehensive explanation of this system and references to related work are presented in (Collier 1994).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural Language Generation for Text-to-Text Applications Using an Information-Slim Representation

I propose a representation formalism and algorithms to be used in a new language generation mechanism for text-to-text applications. The generation process is driven by both text-specific information encoded via probability distributions over words and phrases derived from the input text, and general language knowledge captured by n-gram and syntactic language models. A Text-to-Text Perspective...

متن کامل

A Systematic Study on Document Representation and Dimensionality Reduction for Text Clustering A Systematic Study on Document Representation and Dimensionality Reduction for Text Clustering

Increasingly large text datasets and the high dimensionality associated with natural language is a great challenge of text mining. In this research, a systematic study is conducted of application of three Dimension Reduction Techniques (DRT) on three different document representation methods in the context of the text clustering problem using several standard benchmark datasets. The dimensional...

متن کامل

Assessing Two-Mode Semantic Network Story Representations Using a False Memory Paradigm

This paper describes a novel method of representing semantic networks of stories (and other text) as a two-mode graph. This method has some advantages over traditional one-mode semantic networks, but has the potential drawback (shared with n-gram text networks) that it contains paths that are not present in the text. An empirical study was devised using a false memory paradigm to determine whet...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Evaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM

With the great and rapidly growing number of documents available in digital form (Internet, library, CD-Rom...), the automatic classification of texts has become a significant research field and a fundamental task in document processing. This paper deals with unsupervised classification of textual documents also called text clustering using Self-Organizing Maps of Kohonen in two new situations:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994